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ABSTRACT 


-->  This  paper  describes  a  outpoint  method  for  sampling  from  an 
n-point  discrete  distribution  that  preserves  the  monotone  rela¬ 
tionship  between  a  uniform  deviate  and  the  random  variate  it  gen¬ 
erates.  This  property  is  useful  when  developing  a  sampling  plan 
to  reduce  variance  in  a  Monte  Carlo  or  simulation  study.  The 
alias  sampling  method  generally  lacks  this  property  and  requires 
2n  storage  locations  while  the  proposed  outpoint  sampling  method 
requires  m+n  storage  locations,  where  m  denotes  the  number  of 
outpoints.  The  expected  number  of  comparisons  with  this  method 
is  derived  and  shown  to  be  bounded  above  by  (m  ♦  n  -  l)/n.  The 
paper  describes  an  algorithm  to  implement  the  proposed  method  as 
well  as  two  modifications  for  cases  in  which  n  is  large  and  pos¬ 
sibly  infinite. 
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I.  INTRODUCTION 


Let  X  be  a  discrete  random  variable  on  the  integers  l,...,n 
with  probability  mass  function  (p^;  i*l,...,n}  and  distribution 
function  (d.f.) 

«0  ■  0 

q.  ■  q.  ,  +  p.  i»l,...,n.  (1.1) 

’i-l  ri 

One  straightforward  way  to  sample  from  {q^;  i*l,...,n}  is  to  sam¬ 
ple  U  from  the  uniform  distribution  on  [0,1)  and  then  determine  X 
from 

X  •  min  {i:  q^  >.  U}.  (1.2) 

If  the  d.f.  of  X  is  stored  in  a  table  beforehand,  this  procedure, 
known  as  the  inverse  transform  method,  requires  n  storage  8pacea 
and  EX  comparisons  on  average,  which  may  prove  costly  when  n  is 
moderate  or  large. 

At  present,  the  most  efficient  way  to  sample  repeatedly  from 
the  d.f.  of  X  is  the  alias  method  of  Walker  (1974a,  1974b,  1977). 
See  also  Rromaal  and  Peterson  (1979).  This  method  requires  stor¬ 
age  for' two  arrays,  the  aliases  A^,...,An  and  alias  probabili¬ 
ties  {r^  ■  pr(X«ilL*i);  i"l,...,n)  where  for  i*l,...,n 
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pr(L*i)  ■  1/n 

pr(X-ilL-i)  +  pr(X-A.lL-i)  »  1  (1.3) 

and 

l  n 

p.  -  pr(X-i)  »  —  l  pr(X»iiL*j). 
n  j-1 

Prior  to  beginning  the  sampling  experiment  one  chooses  these  ar¬ 
rays  to  satisfy  (1.3).  After  sampling  U  one  computes  L  ■  InUJ  ♦  1 
and  selects  X  *  L  if  >  U(mod  1/n)  or  otherwise  selects  X  “  AL. 
Here  t©  J  denotes  the  largest  integer  less  than  or  equal  to  6.  Note 
that  only  one  comparison  is  required  to  generate  each  X.  The 
arrays  determined  by  (1.3)  require  2n  storage  locations  and  are 
not  unique.  If  one  wishes  to  retain  the  tabled  values  of  (q^}t  an 
additional  n  storage  locations  are  required. 

Although  the  time  independent  nature  of  the  alias  method  has 
clear  appeal,  the  method  has  two  limitations  that  deserve 
attention: 

a.  In  general,  the  alias  method  does  not  preserve  a  monotone 
relationship  between  U  and  X  as  does  the  inverse  transform  method 
(1.2). 

b.  The  allocation  of  2n  storage  spaces  may  be  infeasible 
either  due  to  the  magnitude  of  n  or  the  requirements  of  other 
steps  in  the  program  in  which  the  alias  method  is  imbedded. 
Furthermore,  if  one  wishes  to  maintain  the  table  of  {q^},  3n 
storage  locations  are  required. 
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While  the  issue  of  storsge  requirements  is  self  explanatory, 
the  significance  of  the  monotone  property  needs  clarification. 

Let  Yj  and  Y2  be  random  variables  with  d.fs.  and  F2  and  inverse 
d.fs.  Gj(u)  ■  min(y:  F^(y)  u)  and  G^u)  =  min(y:  F^(y)  £  u) 
respectively.  Then  the  minimal  correlation  between  and  Y^  occurs 
for  Y^  ■  Gj(U)  and  Y^  *  G^d-U).  The  result  is  due  to  Hoeffding 
(1940).  See  also  Whitt  (1976).  In  Monte  Carlo  sampling  and  dis¬ 
crete  event  simulation  one  often  wants  to  make  use  of  the  minimal 
correlation  property  to  induce  a  variance  reduction  for  a  given 
sampling  cost.  More  generally,  one  often  can  achieve  a  variance 
reduction  by  appropriate  use  of  the  sequence  =  U  +  (k-l)/r  (mod  1) 
for  k*l,...,r  with  a  sampling  technique  that  preserves  monotoni¬ 
city.  See  Hammersley  and  Handscomb  (1964)  and  Fishman  and  Huang 
(1960).  The  alias  method  may  prevent  one  from  effecting  this 
reduction  in  variance. 

2.  THE  OUTPOINT  METHOD 

We  now  describe  the  cutpoint  method  for  sampling  from  the 
d.f.  of  X.  The  procedure  preserves  monotonicity,  maintains  the 
table  of  {q^}  and  allows  the  user  to  adjust  space  and  time  re¬ 
quirements  to  sccoomodate  the  global  needs  of  the  problem  setting. 

The  procedure  again  uses  the  inverse  transform  approach  but  with 
more  information  computed  beforehand,  as  in  the  alias  method. 

The  proposed  method  is  not  new  having  been  described  in  Chen  and 
Asau  (1974).  However,  the  present  paper  is  the  first  to  study 
the  tradeoff  between  computation  time  and  space  analytically. 


For  a  given  positive  integer  n,  define  the  outpoints 


*  min  {i:  q^^  >  U-l)/m)  t-l,...,m  (2.1) 

lm*l  *  n‘ 

Let  L  ■  (mUl  so  that 

pr(IL  <  X  <  IL+1>  *  1  (2.2) 

where  X  is  as  defined  in  (1.2)  and  161  denotes  the  smallest  in¬ 
teger  greater  than  or  equal  to  6.  The  maximal  number  of  com¬ 
parisons  needed  to  determine  X  exactly  is  I^+1  -  1^  +  1  and  the 
expected  maximal  number  of  comparisons  is 

C  •  (I  .  -  1,  ♦  a)/m.  (2.3) 

m,n  m+i  l 

The  storage  requirements  are  m+n  locations  with  the  tabled  d.f. 
of  X  comprising  n  of  them. 

Note  that  C  is  less  than  2,  implying  that  the  cutpoint 
n,n 

method  requires  less  than  one  additional  comparison  on  average  to 
preserve  monotonicity  when  compared  to  the  alias  method  with  the 
same  allocation  of  storage,  tf  the  d.f.  table  of  X  also  is  to  be 
maintained  in  the  alias  method,  then  for  equal  storage  for  the 
cutpoint  method  C, _  is  less  than  1.5  so  that  at  most  1/2  of  an 

40  )H 

additional  comparison  is  needed  on  average. 

A  more  revealing  evaluation  arises  if  the  d.f.  of  X  is 
directly  taken  into  account.  Let  Ja  denote  the  number  of  com¬ 
parisons  on  a  trial.  Than  J  equals  X  -  I,  ♦  1,  where  X  is 
determined  as  in  (1.2),  and  has  expectation 
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llK'-L'-l'E"..  J  UJMB-fl' 


EJ  -  1  +  EX  -  El, 
m  L 

m  (2.4) 

-  1  +  EX  -  I  IJm  . 

S.-1  1 

Table  1  illustrates  the  proposed  method  using  the  eight-point 
distribution  in  Fishman  (1978,  p.  459). 


Insert  Table  1  about  here 


If  space  is  not  an  issue  then  the  alias  method  is  the  procedure  of 

choice.  One  then  may  view  EJ  -  1  as  the  cost  of  maintaining  the 

m 

monotone  property. 

3.  THE  CASE  OF  LARGE  n 

If  {q^}  has  infinite  support  (n*»  •»)  then  neither  the  alias 
method  nor  our  cutpoint  method  alone  suffices  to  perform  sampling. 
This  insufficiency  also  may  occur  if  n  is  merely  large  relative 
to  space  availability.  Here  Kronmal  and  Peterson  (1979)  suggest 
using  the  alias  method  "for  a  finite  (but  large)  range  of  the  de¬ 
sired  discrete  distribution  and  a  special  tail-generating  method 
for  the  tail  beyond".  Ahrens  and  Dieter  (1973)  discuss  the  tail¬ 
generating  methods  for  several  common  parametric  families  of 
distributions.  More  recently  Ahrens  and  Kobrt  (1981)  described  a 
cutpoint  method  with  a  more  dense  frequency  of  cutpoints  in  the 
tails. 

Our  own  proposals  for  this  situation  take  two  forms.  Note 

that  in  principle  (2.4)  suggests  that  EJ  may  be  determined  for 

in 

infinite  n  if  EX  is  finite.  Consider  n*  <  n  such  that  a  pro¬ 
cedure  is  available  for  computing  (q^;  i  >  n*} .  The  cutpoint 
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method  then  applies  directly  with  tables  used  for  {q^;  i  <  n*} 
and  the  available  procedure  for  i  >  n*.  The  mean  number  of  com¬ 
parisons  remains  the  same  while  the  mean  cost  in  time  is  propor¬ 
tional  to 


V .  -  EJ  +  c,  E(J  IX  >  n*)pr(X  >  n*)  (3.1) 

l  m  i  m 

where  c^  is  the  (computer  dependent)  increase  in  time  required  to 
evaluate  q^  for  i  >  n*  relative  to  the  time  required  for  a  table 
lookup  of  q^  when  i  £  n*. 

As  an  alternative,  suppose  that  in  addition  to  tabling 
{I  ;  1*1,... ,m}  we  elect  to  table  {q  ,  q  ,...,q  }  instead  of 

1  h  h 

{qj,  q2...,qn)  .  To  sample  X,  one  proceeds,  as  before. 


to  select  I  but  now  calculates  q 
L  I. 


q_  , as  needed. 


L+l  "L+2 

Such  calculations  may  be  faster  and  more  accurate  due  to  the 


nearby  starting  value  q  .  The  mean  cost  in  time  is  propor- 
tional  to 


V2  -  (l  ♦  c2>EJm  ~  c2  (3.2) 

where  c2  is  Che  (computer  dependent)  relative  increase  in  time 
required  to  evaluate  q.  for  i  +  I  as  compared  to  the  time  re- 
quired  for  a  table  lookup  of  q  . 

h 

Clearly  u2  £  if  and  only  if 

c,  E(J  -  1)  -  c.  E(J  IX  >  n*)  pr(X  >  n*)  <  0  .  (3.3) 

a  q  in  *• 

In  addition,  the  second  procedure  requires  2m  storage  locations 

while  the  first  requires  m+n*.  For  most  applications  c^  >  c2  >  0 

and  as  n*  increases  the  left  hand  side  of  expression  (3.3)  mo- 

notonically  increases  from  a  negative  value  of  (c2  -  c^)EJq  -  c2> 


when  n*  *  0,  to  a  positive  limit  of  c.  EJ  -  c_,  if  EX  <  •  . 

2  m  2 

Again,  the  user  is  faced  with  a  tradeoff  between  time  and  space 
requirements . 

These  remarks  are  intended  to  be  suggestive  rather  than  de¬ 
finitive.  Individual  decisions  should  be  guided  by  the  require¬ 
ments  and  resources  of  the  application  intended.  Although  the 
dominance  of  any  of  the  methods  described  here  with  respect  to 
time  and  storage  remains  a  question,  one  should  keep  in  mind  that 
all  the  cutpoint  methods  described  preserve  the  monotone  property. 

4.  ALGORITHMS 

In  this  section  we  present  algorithms  for  implementing  the 
cutpoint  method  when  sampling  from  the  d.f.  of  the  random 
variable  X.  It  is  assumed  throughout  that  Q  is  the  name  of  an 
array  or  function  such  that  Q(i)  ■  q^.  In  addition  x  denotes 
the  smallest  integer  greater  than  or  equal  to  x. 

Given  the  positive  integer  M"m,  the  algorithm  CMSET  in  Figure  1 
returns  the  array  (l(i)  *  I:  £*1,...,M},  as  described  in  Sec- 
tion  2.  If  desired,  the  array  (QlU)  ■  q  ;  1*1,..., M}  suitable 
for  use  as  described  in  Section  3  is  also  returned;  otherwise,  one 
deletes  statement  8  of  CMSET.  Note  that  if  the  d.f.  of  X  is 
tabled,. the  array  Q  is  not  destroyed  by  CMSET. 

Figure  1  About  Here 


Algorithm  CM  in  Figure  2  enables  one  to  sample  for  X,  once 


the  setup  in  algorithm  CMSET  has  been  effected.  An  algorithm  for 
sampling  from  the  uniform  distribution  on  [0,1)  is  used  in  the 


first  step  of  CM.  The  random  variable  X  need  not  have  finite 
support  in  order  for  CM  to  function  cor  ectly.  The  value  of  X 
upon  return  from  CM  is  the  variate  desired. 


Figure  2  About  Here 


Algorithms  for  the  setup  and  use  of  the  alias  method  are 
given  in  Kronmal  and  Peterson  (1979)  and  Ahrens  and  Kohrt 
(1981).  Care  must  be  taken  with  these  algorithms  to  preserve 
{q.}  and  avoid  the  use  of  large  arrays  during  initialization. 
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TABLE  1.  EXAMPLE  OF  OUTPOINT  METHOD 

i 

Pi 

qi 

m 

m+n 

c 

m,n 

EJ 

IS 

1 

.01 

.01 

i 

9 

8.00 

5.31 

2 

.04 

.05 

2 

10 

4.50 

3.31 

3 

.07 

.12 

3 

11 

3.33 

2.31 

4 

.15 

.27 

4 

12 

2.75 

2.06 

5 

.28 

.55 

5 

13 

2.40 

1.71 

6 

.19 

.74 

6 

14 

2.17 

1.64 

7 

.21 

.95 

7 

15 

2.00 

1.45 

8 

.05 

1.00 

8 

10 

1.88 

1.44 

16 

24 

1.44 

1.19 
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FIGURE  1.  ALGORITHM  CMSET 


1.  L  *  0. 

2.  J  *  0. 

3.  J  «■  J+l. 

4.  QJ  ♦  M*Q( J) . 

5.  IF  QJ  _<  L  THEN  GO  TO  3. 

6.  L  ♦  L+l. 

7.  I(L)  «-  J. 

8.  QI(L)  ♦  QJ/M. 

9.  IF  L<M  THEN  GO  TO  5. 

10.  RETURN. 


FIGURE  2.  ALGORITHM  CM 


.  SAMPLE  U  FROM  UNIFORM  (0,1). 

2.  x  ♦  i( Tm*u1) . 

3.  IF  U  <.  Q(X)  THEN  RETURN. 

4.  X  ♦  X+l. 


5 


GO  TO  3 
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